R markdown combines formatted text and code and outputs! Great for reproducibility - less opportunities for mistakes.
First we attach packages (in the code chunk above)
If you need to install package: go to the console, type install.packages(“packagename”)
Read in data (command option i for code chunk) (command enter runs code)
sf_trees <- read_csv(here("data", "sf_trees", "sf_trees.csv"))
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## tree_id = col_double(),
## legal_status = col_character(),
## species = col_character(),
## address = col_character(),
## site_order = col_double(),
## site_info = col_character(),
## caretaker = col_character(),
## date = col_date(format = ""),
## dbh = col_double(),
## plot_size = col_character(),
## latitude = col_double(),
## longitude = col_double()
## )
##Basic wrangling reminders
refresh data wrangling skills!
Find top 5 highest observations of trees by legal status, then do some wrangling and make a graph.
(command shift M is pipe operator)
top_5_status <- sf_trees %>%
count(legal_status) %>%
drop_na(legal_status) %>%
rename(tree_count = n) %>%
relocate(tree_count) %>%
slice_max(tree_count, n= 5)
## count basically combines group by, n, summarize functions. super useful
##drop_na removes any rows that contain a missing or na value for the variable you specify
# rename - new name goes first, then old name
# relocate: tree_count moves to the first column
#slice_max allows you to ID the rows with highest values for variable that you specify, and then only keeps top ##
Make a graph of top 5 observations by legal status
ggplot(data = top_5_status, aes(x = fct_reorder(legal_status, tree_count), y = tree_count)) +
geom_col() +
labs(x = "Legal Status", y = "Tree Count", title = "Test Title") +
coord_flip() +
theme_minimal()
A few more data wrangling refresher examples!!
Only want to keep observations (rows) for blackwood acacia trees - no separate column for scientific and common names… but we can look for everything that contains “blackwood acacia” using filter.
blackwood <- sf_trees %>%
filter(str_detect(species, "Blackwood Acacia")) %>%
select(legal_status, date, latitude, longitude)
ggplot(data = blackwood, aes(x = longitude, y = latitude)) +
geom_point()
## Warning: Removed 27 rows containing missing values (geom_point).
##string detect (str_detect) looks for a string within a variable that we specify
# select helps us pick columns
Use tidyr :: separate and unite functions. useful for combining or separating columns
sf_trees_sep <- sf_trees %>%
separate(species, into = c("spp_sci", "spp_common"), sep = "::")
Example of unite… (not sure why we’d do this!)
sf_trees_unite <- sf_trees %>%
unite("id_status", tree_id:legal_status, sep = "!!!!!")
Make some actual maps of Blackwood Acacia trees in SF.
We’ll use st_as_sf to convert lat and long values to spatial coordinates
blackwood_spatial <- blackwood %>%
drop_na(longitude, latitude) %>%
st_as_sf(coords = c("longitude", "latitude"))
st_crs(blackwood_spatial) = 4326
ggplot(data = blackwood_spatial) +
geom_sf(color = "darkgreen") +
theme_minimal()
##geom_sf is for plotting spatial data in ggplot! once we've set the coordinate system.
#but this is still hard to interpret...
Read in sf roads! to make this map make more sense
sf_map <- read_sf(here("data", "sf_map", "tl_2017_06075_roads.shp"))
##need these in the same coordinate system! there's already an existing crs for this so we'll use st_transform
st_transform(sf_map, 4362)
## Simple feature collection with 4087 features and 4 fields
## geometry type: LINESTRING
## dimension: XYZ
## bbox: xmin: -2714477 ymin: -4267015 xmax: -2699379 ymax: -4255322
## z_range: zmin: 3879865 zmax: 3890745
## projected CRS: NAD83(HARN)
## # A tibble: 4,087 x 5
## LINEARID FULLNAME RTTYP MTFCC geometry
## * <chr> <chr> <chr> <chr> <LINESTRING [m]>
## 1 110498938… Hwy 101 S O… M S1400 Z (-2706002 -4263301 3883402, -2705996 -…
## 2 110498937… Hwy 101 N o… M S1400 Z (-2709096 -4256623 3888533, -2709113 -…
## 3 110366022… Ludlow Aly … M S1780 Z (-2710490 -4261246 3882534, -2710490 -…
## 4 110608181… Mission Bay… M S1400 Z (-2704477 -4262466 3885368, -2704348 -…
## 5 110366689… 25th Ave N M S1400 Z (-2710575 -4257083 3887009, -2710560 -…
## 6 110368970… Willard N M S1400 Z (-2708855 -4259093 3886013, -2708854 -…
## 7 110368970… 25th Ave N M S1400 Z (-2710575 -4257083 3887009, -2710581 -…
## 8 110498933… Avenue N M S1400 Z (-2700453 -4261099 3889634, -2700376 -…
## 9 110368970… 25th Ave N M S1400 Z (-2710538 -4257082 3887035, -2710575 -…
## 10 110367749… Mission Bay… M S1400 Z (-2703872 -4262847 3885371, -2703973 -…
## # … with 4,077 more rows
ggplot(data = sf_map) +
geom_sf()
Now combine tree observations and roads map!
ggplot() +
geom_sf(data = sf_map, size = 0.1, color = "darkgray") +
geom_sf(data = blackwood_spatial, size = 0.4, color = "darkgreen") +
theme_void() +
labs(title = "Blackwood Acacias in San Francisco")
Let’s make this interactive!!
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(blackwood_spatial) +
tm_dots()